By David C. Moss
In our April issue, we explained how you and your users can use Entity-Relationship modeling to create a logical data model of your organization's information needs. In this article, we'll look at how to turn your Entity-Relationship (E-R) model--a logical model--into a physical data model using Designer/2000. This physical data model will become the basis for creating tables, columns, and foreign keys--ultimately generating Data Definition Language (DDL) scripts to be used in creating your database.
As we've mentioned, determining when you're ready to begin translating your logical models into physical models is a joint effort between you and your users. All parties must agree that the scope of the application will satisfy the business needs of your organization. The most difficult part of reaching consensus is the tightening of the application scope to achieve time and budget constraints. Everyone needs to understand that changes to this scope will impact the project deadline and budget.
Before you begin translating your logical data model into a physical data model, you must complete a number of review tasks. You need to perform quality checking on the E-R model, verify the business need for all attributes, determine unique identifiers for each entity, and perform final quality checking. Most important, you need to get final approval of the model from your users. Let's discuss each of these tasks.
Our first item to review is the E-R model itself. As you'll recall, this model shows each item of significance in your organization or application and how it relates to another piece of information. The main components of an E-R model are entities--depicted as rounded boxes--and relationships--depicted as solid or broken lines between two entities.
During the Strategy and Analysis stages of Oracle's CASE*Method, it's best to try to keep an open mind about possible physical outcomes of the E-R model by showing precisely what you discover in the organization you're studying. This process is called divergence. As you move through the Analysis stage, you'll collect most of the detailed information that will make its way into your application. Next, you'll conduct the process known as convergence.
Converging your models involves determining if there are similar or repeating structures that could be better handled by one structure. As in all techniques, the consolidation of model objects has its positive and negative sides. On the positive side, convergence can result in better understanding of your model as well as fewer, more efficient structures. On the negative side, you can go too far and consolidate a well-understood model into a handful of highly complex structures that developers can't easily understand. As always, remember your audience: Your models need to make sense, but they need to work, too!
Another helpful and necessary process prior to translating logical models to physical models is called normalization. This process results in entities that have no repeating groups, the same kinds of values assigned to attributes or columns, a distinct name, and distinct and uniquely identifiable rows.
An attribute is any detail that serves to qualify, identify, classify, quantify, or express the state of an entity. Examples include such things as name, address, city, state, and zip code for a customer. During the Strategy and Analysis stages, you'll be collecting hundreds, perhaps thousands of attributes. The process of normalization of entities really comes down to successfully grouping attributes and creating entities that conform to the rules of normalization.
If you're involved in a Data Warehouse project or are moving from a non-relational to a relational database, one of your biggest challenges is to determine which attributes are duplicates of other attributes. This is not an easy task, since the attributes may have different names and data types. We strongly recommend using the Text tab in the Edit Entity dialog box to record the origin of each attribute as you enter it into your new system definition. This will not only help in identifying duplicates, but will make attributes much easier to map for data conversion.
Another type of attribute to look for carefully is a derived attribute. Most often, these attributes are the result of calculations made by an application. It's often quite convenient to calculate these derived attributes once and refer to them many times in an application. However, you must always keep in mind that, in the ideal world, these calculations could be made on the fly. Derived attributes, by necessity, must have some application code to routinely update them; and all applications using the derived attributes must take into account the freshness of the data.
Figure A shows the Edit Entity dialog box for Attributes for the Customer entity. Note that the tabs at the top of this dialog box--Definition, Synonyms, UIDs (unique identifiers), Attributes, Att (attribute) Detail, Att Values, and Text--allow you to add details for the entity.
A UID is the combination of attributes and/or relationships that makes each occurrence of an entity unique. Usually a UID consists of one or more of the attributes for an entity. In other cases, a UID consists of a combination of attributes and relationships. In rare cases, a UID can consist entirely of relationships to other entities. Figure B shows the UID for the Account entity. Note that it utilizes its relationships to the Customer and Loan Program entities. Essentially, each occurrence of a Customer with a specific Loan Program creates a new occurrence of the Account entity.
Easily the most controversial topic in E-R modeling is the use of surrogate versus natural unique identifiers. Natural UIDs consist of all the attributes necessary to uniquely identify an entity. In the case of a plan for aircraft design, this may involve 10 or more attributes, perhaps hundreds of characters long. Therefore, this natural UID will be extremely cumbersome as a primary key in a table and will cause considerable grief in maintenance.
The solution is to use a surrogate key--usually a uniquely generated sequential number. The Oracle RDBMS has a built-in mechanism to generate these sequential keys. You can indicate a surrogate key in the Analysis stage by adding an attribute, usually "ID," with a data type of Number and a length of 10 to 16, depending on the number of unique occurrences you expect to have over the life of the system. We recommend that you make decisions on surrogate keys near the end of the Analysis stage, once you've fully analyzed and normalized your entities and attributes.
You need to perform some final quality checks before translating your Entity model. You should ask yourself the following question: Does each entity have a name, a meaningful description, at least two significant attributes, a unique identifier, and one or more functions responsible for its creation? If not, then you're not finished!
Also, does each attribute have a meaningful name, a concise description, a data type, its optionality specified, and an indication it takes part in the unique identifier? If not, keep working!
Finally, are both ends of each relationship named? Can the owner read and understand the relationship in both directions? Are the optionality and degree specified? Is there an indication that it takes part in the unique identifier? If not, drink that extra cup of coffee and finish up!
By the way, you'll want to look at Figure C to see what a final E-R diagram looks like. Please note that some of the attributes are displayed under each entity name. If there were more space on the diagram, all of the attributes would be displayed. Also note that the O in front of an attribute indicates that the attribute is optional; the * indicates that it's mandatory; and the # indicates that it's part of the primary key. The short horizontal lines on the relationship line from the Account to the Customer entity and from the Account to the Loan Program entity indicate that these relationships are part of the UID.
So, how do you know when you're finally ready to begin translating your logical model into a physical model? As we've mentioned, you develop and complete the logical model during the Strategy and Analysis stages. These two stages mark the boundaries for deciding the scope of the implementation--what will become code and what won't. After you've made that determination, the Design stage is the appropriate time to do the logical-to-physical translation.
One of the best things about Designer/2000 is its automation features. And one of the best automation tools is the Database Design Wizard. Once you've performed all the quality checking, you can run the Database Design Wizard utility from the E-R Diagrammer, the Repository Object Navigator, or several other places. From there, you simply select which entities you want to translate into tables. The Database Design Wizard does the rest of the work by creating all of the table, column, primary key, and foreign key definitions for you in just minutes. In most cases, each entity you defined will become a table; each attribute will become a column in a table; each UID will become a primary key; and relationships to entities will become foreign keys.
We've spent almost the entire article talking about what we need to do before we run a utility program in Designer/2000. Funny thing is, that's exactly how things work in real life CASE projects: You spend quality time getting user acceptance and then do the automated things right the first time.
Of course, just running the Database Design Wizard isnÃt all there is to database design. In fact, it's just the start. With your table, column, and foreign key definitions, there's much detail to add before you can move on to module definition--a topic we'll address in a future article. In next month's article, we'll discuss what you need to do in order to successfully run the Application Design Wizard.
If you'd like to find out more about Oracle's approach to database design, please read CASE*Method - Entity-Relationship Modeling by Richard Barker (Oracle/Addison-Wesley, 1990). Appendix F is devoted to Relational Database Design.
David Moss is a managing consultant with TrueNorth Consulting, Inc. He has been using, teaching, and preaching CASE methods and tools for as long as he can remember, including over three years with Oracle Consulting Group. You can reach him by phone at (503) 220-1790 or by E-mail at truenrth@ix.netcom.com.
[Return to Index for Exploring Oracle Developer/2000 and Designer/2000 - June 1996]
Copyright (c) 1996 The Cobb Group, a division of Ziff-Davis Publishing Company. All rights reserved. Reproduction in whole or in part in any form or medium without express written permission of Ziff-Davis Publishing Company is prohibited. The Cobb Group and The Cobb Group logo are trademarks of Ziff-Davis Publishing Company.